04. Recovering all your systems
ND0063 C1 L4 04 Developing Your Intuition About X Video
Monitoring and responding are core to every vital system. When you architect a platform, you should always think about how you will know if something is wrong with that platform early on in the design process. There are many different kinds of monitoring that can be applied to many different facets of the system, and knowing which types to apply where it can be the difference between success and failure.
Always ask yourself how you would diagnose issues with an application, how would you understand it's health, what are it's choke points, how would you identify them and what would you do when something breaks. While thinking through these concepts is important, it is very difficult to foresee every possible scenario. This is why advanced organizations employ techniques like "chaos engineering" to intentionally cause breakage in their environments in a controlled manner. If you build a resilient system, it should be resilient, so why not terminate a random server? It may be hard to get accustomed to this idea, but it can provide insight that would otherwise be impossible to gain.
Practicing
SOLUTION:
To practice monitoring your production environmentsDisrupt production